Dynomotion

Group: DynoMotion Message: 15000 From: tesotronics Date: 9/4/2017
Subject: Incidental watchdog triggers

Hi Tom,

 

I’ve been asked by a customer to debug his machine, which has been converted to Kmotion (KFLOP + Kanalog) by someone else. I’ve been working on it for quite some time now and the machine has a hard to find bug. Sometimes (about 1 out of 50 runs of the same G-code) the KFLOP crashes. The “SWE” watchdog pin goes high and the KFLOP becomes unreachable.

 

I have taken quite some measures so far, without any improvement. Here’s what I’ve done up till now:

 

-Added diodes to all relays (these were missing) to make sure the recirculating current patch exists and that the wires in the path are as short as possible

-Rewired lots of cabling to get rid of ground loops

-Changed the power supply to a completely isolated power supply (floating ground)

-Added common mode filtering to power supply lines

-Added galvanic isolation to all switch outputs (board now uses the opto outs i.s.o the FET outputs)

-Changed from a laptop type setup with very long USB cable to a dedicated mini-PC with a USB cable of only 8 inches.

-Added the USB isolator from Olimex

-Added additional capacitors to the reset lines of KFLOP (read that in an on-line post somewhere)

 

Here’s what I still plan to do before my box of hardware tricks runs out:

 

-Add common mode filters to the output ADC’s which control the axes

-Take radiated EMC emission measures (add grounded enclosure around KFLOP + Kanalog)

-Add ferrite chokes around the I/O to the encoders

 

I have another very similar machine that runs just fine, without most of the measures from the list above. I’m beginning to suspect that it might not be a hardware problem, since all the measures so far have not improved the situation even the slightest bit.

 

So here are my questions:

-Is it possible that the watchdog triggers because of a SW problem (perhaps I should look for a division by zero or something in the code that has been written by the guy who did the conversion)?

-Have you had a customer with a similar “incidental watchdog trigger” issue, and if so, how was it solved?

-Do you have any suggestions how to further go ahead with problem solving of this issue?


Group: DynoMotion Message: 15001 From: Tom Kerekes Date: 9/5/2017
Subject: Re: Incidental watchdog triggers

Sorry to hear you are having such problems.

I haven't heard of anyone else with anything similar.

It sounds like KFLOP is crashing due to either some hardware or software problem which then stops communication to Kanalog which then triggers the watchdog.

Everything you have done seems reasonable to me.

You might try swapping out KFLOP to see if that has an effect.  I suppose bad/intermittent memory would behave like that.  But I'm not aware of any other cases of this.  Where are you located?  We could loan you a replacement board if necessary.

A software bug is another possibility.  If you post your User Programs we can look them over for you.

What Version of KMotion are you running?  I assume you are running KMotionCNC?

Any other clues to what causes the failure?  Does it ever happen when sitting idle?  With Spindle off?  Anything special the GCode is doing when it occurs?

Regards

TK


On 9/4/2017 10:35 PM, mike.bax@... [DynoMotion] wrote:
 

Hi Tom,

 

I’ve been asked by a customer to debug his machine, which has been converted to Kmotion (KFLOP + Kanalog) by someone else. I’ve been working on it for quite some time now and the machine has a hard to find bug. Sometimes (about 1 out of 50 runs of the same G-code) the KFLOP crashes. The “SWE” watchdog pin goes high and the KFLOP becomes unreachable.

 

I have taken quite some measures so far, without any improvement. Here’s what I’ve done up till now:

 

-Added diodes to all relays (these were missing) to make sure the recirculating current patch exists and that the wires in the path are as short as possible

-Rewired lots of cabling to get rid of ground loops

-Changed the power supply to a completely isolated power supply (floating ground)

-Added common mode filtering to power supply lines

-Added galvanic isolation to all switch outputs (board now uses the opto outs i.s.o the FET outputs)

-Changed from a laptop type setup with very long USB cable to a dedicated mini-PC with a USB cable of only 8 inches.

-Added the USB isolator from Olimex

-Added additional capacitors to the reset lines of KFLOP (read that in an on-line post somewhere)

 

Here’s what I still plan to do before my box of hardware tricks runs out:

 

-Add common mode filters to the output ADC’s which control the axes

-Take radiated EMC emission measures (add grounded enclosure around KFLOP + Kanalog)

-Add ferrite chokes around the I/O to the encoders

 

I have another very similar machine that runs just fine, without most of the measures from the list above. I’m beginning to suspect that it might not be a hardware problem, since all the measures so far have not improved the situation even the slightest bit.

 

So here are my questions:

-Is it possible that the watchdog triggers because of a SW problem (perhaps I should look for a division by zero or something in the code that has been written by the guy who did the conversion)?

-Have you had a customer with a similar “incidental watchdog trigger” issue, and if so, how was it solved?

-Do you have any suggestions how to further go ahead with problem solving of this issue?



Group: DynoMotion Message: 15003 From: tesotronics Date: 9/5/2017
Subject: Re: Incidental watchdog triggers
Last night I swapped out the KFLOP with another KFLOP from a known well working machine (my Wire EDM machine). The issue still persisted. I've been told that the Kanalog has also been replaced in the past without any luck.
Unfortunately the machine has been setup to be operated from Mach3. I'd like to move to KMotionCNC, which is possible but quite some machine algorithms (shifting and gear selection and such) have been implemented in VB scripts in Mach3. So if I want to use KMotionCNC, I'll have to rewrite a large part of the code.
The owner of the machine also prefers to use Mach3.
Is it possible that Mach3 can cause a watchdog trigger? If mach3 is suspect, I can convince the owner to move to KMotionCNC and invest the time to completely rewrite the code in order to get rid of Mach. However it would really be a bummer if the issue then still persists...
The issue only seems to manifest itself during active operation of the machine. It never occurs when the machine is idle.

There is another bug that seems to be related. Mach sometimes makes a message box, when this happens a KFLOP crash is likely to happen soon. I've been digging in the Mach3 scripts and this is the piece of code that makes the message box:

Dim RPM, SRO
RPM = GetRPM()
SRO = GetOEMDRO(74)
sleep 100
While (GetOEMDRO(1301) <> 0) 'is not shifting
Sleep 50
Wend
DoSpinCW()
NotifyPlugins(10603) 
SetSpinSpeed( RPM )
SetOEMDRO(74, SRO)                          
sleep 4000
If (IsActive(OemTrig5) = true) Then
MsgBox("Spindle does not start, use S command.(M3)")
DoOEMButton(1003)
DoSpinStop()
NotifyPlugins(10605)   
End If  
If (IsActive(OemTrig6) = false) Then
MsgBox("Spindle does not start, use S command.(M3)")
DoOEMButton(1003)
DoSpinStop()
NotifyPlugins(10605)   
End If


So Mach starts the spindle, then waits and reads a bit back from KFLOP, it expects a value, but gets something else, so I guess the read action delivers a wrong value. I still have to familiarize myself with Mach code, I don't know how OemTrigX is mapped to Kanalog bits.

I've attached the code that is running on KFLOP

 
  @@attachment@@
Group: DynoMotion Message: 15004 From: Tom Kerekes Date: 9/5/2017
Subject: Re: Incidental watchdog triggers [3 Attachments]

Thanks for the clear info.

#1 This looks like a bug:

        printf("speed:" + (int)speed);

I believe this will compile to take the memory address of wherever the compiler put the string "speed:", add "speed" bytes to it, and print whatever garbage data is there, however long its is, and without a new line terminator.  

It occurs 2 places in 4_NotifyMach3.c

The VB code you suspect is causing the problem is sending a Notify 10603 to execute that bad printf statement.

The correct way to print the speed as a float would be:

        printf("speed:%f\n", speed);

You might remove all print statements at least as a test.  The print statements are really only to be used for debugging.


#2 issue is that the 4_NotifyMach3.c program seems to be a mixture of a Notify program and a Spindle Program.  When the Plugin calls the Notify Program it only sets the Notify "msg", the Spindle message is not set.  And vice versa.  The previous message may still be there, but I'm not sure that would always be the case.  Also performing some function again may not be appropriate.

#3 you forgot to send us MH-C-V1.0.02.h

#4 what Version of Mach3 are you running?

#5 what Version of KMotion?

#6 what Version of Windows?

#7 please post a screen shot of your Mach3/Dynomotion Plugin Configuration Screen

Regards
TK



On 9/5/2017 10:40 PM, mike.bax@... [DynoMotion] wrote:
 

Last night I swapped out the KFLOP with another KFLOP from a known well working machine (my Wire EDM machine). The issue still persisted. I've been told that the Kanalog has also been replaced in the past without any luck.

Unfortunately the machine has been setup to be operated from Mach3. I'd like to move to KMotionCNC, which is possible but quite some machine algorithms (shifting and gear selection and such) have been implemented in VB scripts in Mach3. So if I want to use KMotionCNC, I'll have to rewrite a large part of the code.
The owner of the machine also prefers to use Mach3.
Is it possible that Mach3 can cause a watchdog trigger? If mach3 is suspect, I can convince the owner to move to KMotionCNC and invest the time to completely rewrite the code in order to get rid of Mach. However it would really be a bummer if the issue then still persists...
The issue only seems to manifest itself during active operation of the machine. It never occurs when the machine is idle.

There is another bug that seems to be related. Mach sometimes makes a message box, when this happens a KFLOP crash is likely to happen soon. I've been digging in the Mach3 scripts and this is the piece of code that makes the message box:

Dim RPM, SRO
RPM = GetRPM()
SRO = GetOEMDRO(74)
sleep 100
While (GetOEMDRO(1301) <> 0) 'is not shifting
Sleep 50
Wend
DoSpinCW()
NotifyPlugins(10603) 
SetSpinSpeed( RPM )
SetOEMDRO(74, SRO)                          
sleep 4000
If (IsActive(OemTrig5) = true) Then
MsgBox("Spindle does not start, use S command.(M3)")
DoOEMButton(1003)
DoSpinStop()
NotifyPlugins(10605)   
End If  
If (IsActive(OemTrig6) = false) Then
MsgBox("Spindle does not start, use S command.(M3)")
DoOEMButton(1003)
DoSpinStop()
NotifyPlugins(10605)   
End If


So Mach starts the spindle, then waits and reads a bit back from KFLOP, it expects a value, but gets something else, so I guess the read action delivers a wrong value. I still have to familiarize myself with Mach code, I don't know how OemTrigX is mapped to Kanalog bits.

I've attached the code that is running on KFLOP

 

Group: DynoMotion Message: 15005 From: tesotronics Date: 9/6/2017
Subject: Re: Incidental watchdog triggers
Thank for the review of the code! 
I inherited it and it is indeed messy. I will definitely modify your suggestions asap.

The machine uses:
KFLOP V4.33, with V4.34 USB driver
Win 10 Home
Mach3 V2.0

will post a screenshot of the configs screen when I have access to the machine again. I've added some additional files for review.

I'm not sure what you mean by "spindle message" or "previous spindle message". Where is that in the code of 4_NotifyMach3.c
 
  @@attachment@@
Group: DynoMotion Message: 15006 From: Tom Kerekes Date: 9/6/2017
Subject: Re: Incidental watchdog triggers [3 Attachments]

Yes indeed sloppy.  There is a nice function in Visual Studio -  Edit | Advanced | Format Selection that I use on code like this to make it somewhat readable.

4_NotifyMach3.c has code such as:

    //spindle settings
    int message = persist.UserData[0];          // Mach3 message ID
    int Direction = persist.UserData[1];  // Mach3 Spindle Direction
    float speed = *(float *)&persist.UserData[2];  // value stored is actually a float
    int DirFactor = 1;
    if (Direction==0) DirFactor=-1; // change Direcion 0 or 1 to DirFactor -1 or +1
    if ( speed > 0.96 ) speed = 0.96 ;    //max for speed in % 0.95
    printf("Mach3 Notify Message=%d, Direction=%2d, Spindle Set to %f\n",message,Direction,speed);

and

    if (message==5)                            //sp cw (alleen voor JO jansen)

None of these spindle variables are set or necessarily valid when the plugin runs the Notify Program.

I suppose the assumption is that the persist variables were set to something valid when a previous Spindle Program was invoked.

Normally the Mach3 version is a number like 3.043.066

Regards
TK

On 9/6/2017 3:12 AM, mike.bax@... [DynoMotion] wrote:
 

Thank for the review of the code! 

I inherited it and it is indeed messy. I will definitely modify your suggestions asap.

The machine uses:
KFLOP V4.33, with V4.34 USB driver
Win 10 Home
Mach3 V2.0

will post a screenshot of the configs screen when I have access to the machine again. I've added some additional files for review.

I'm not sure what you mean by "spindle message" or "previous spindle message". Where is that in the code of 4_NotifyMach3.c
 

Group: DynoMotion Message: 15007 From: tesotronics Date: 9/7/2017
Subject: Re: Incidental watchdog triggers
Attachments :
Hi Tom,

Just got back from my customer. I removed all printf's in the code. Mach3 does not give the pop-up anymore, so that's better. Unfortunately however, the incidental watchdog trigger still happens. It seems to happen around the same time when the spindle is started.

I've added a screenshot of the Kmotion Mach plugin. I noticed that the guy who build the machine filled in the "4_NotifyMach.c" program for the spindle speed user program as well as the custom notify program in the plugin. I only removed the printf's and kept the rest of the code the same.

What could cause an incidental SW crash? Is there some strategy to take to find this bug?

At this moment I'm so frustrated, I'm considering of dumping all existing code from the guy who build the machine and to start from scratch with my own code and ditch Mach3 as well. However that requires some serious effort which I need to charge my customer, he won't be happy. I'd rather perform a patch.
  @@attachment@@
Group: DynoMotion Message: 15008 From: Tom Kerekes Date: 9/7/2017
Subject: Re: Incidental watchdog triggers [1 Attachment]

As expected the Plugin Configuration is running the same program in the same Thread for two different purposes.  That seems like a bad idea to me.  Also the Spindle Configuration is set to Download only once.  I can't really think of anything specific of how that would cause a crash, but its hard to think through every scenario.

For example having a Notify Message download the program into Thread #4 at the same time a Spindle message Triggers an execution of the same Thread #4 assuming the program is already loaded seems dangerous.  But loading the same program that is already there shouldn't effectively make any change so I suppose it shouldn't matter matter.

Also as I described before the single program looks at both Notify Commands in Variable 6 and Spindle Commands in Variables 0+ even though it was only invoked to do Notify stuff with a command set in varable 6 or Spindle stuff with a command set in variable 0.

Also using the same Thread for the two functions means that depending on the timing one function might not be completed before the other function kills the Thread and restarts it.  That might leave some operation only partially performed and potentially in some indeterminate state.

It probably has a low percentage chance of being the problem but if I were you I would eliminate this issue.

At a minimum have the two functions run in two different Threads.  I would also uncheck the download only once option.


Regarding strategy: you might run dry runs with no power to the Spindle and/or motors to see if the problem still occurs.  That would be a big clue to whether it is a hardware noise problem or some software bug.  Based on the result other tests could be created to narrow it down further.  About how long on average does it take to induce a fault/crash?


You might post your Mach3 XML for us to look at.


What Version of Mach3 are you running?

Regards

TK



On 9/7/2017 1:58 PM, mike.bax@... [DynoMotion] wrote:
 

Hi Tom,


Just got back from my customer. I removed all printf's in the code. Mach3 does not give the pop-up anymore, so that's better. Unfortunately however, the incidental watchdog trigger still happens. It seems to happen around the same time when the spindle is started.

I've added a screenshot of the Kmotion Mach plugin. I noticed that the guy who build the machine filled in the "4_NotifyMach.c" program for the spindle speed user program as well as the custom notify program in the plugin. I only removed the printf's and kept the rest of the code the same.

What could cause an incidental SW crash? Is there some strategy to take to find this bug?

At this moment I'm so frustrated, I'm considering of dumping all existing code from the guy who build the machine and to start from scratch with my own code and ditch Mach3 as well. However that requires some serious effort which I need to charge my customer, he won't be happy. I'd rather perform a patch.

Group: DynoMotion Message: 15009 From: tesotronics Date: 9/7/2017
Subject: Re: Incidental watchdog triggers
Hi Tom,

Thanks for your quick replies, really appreciate that.

The machine uses Mach Version R3.043.066, I've attached the XML.

I'll split off the part that handles the spindle into a separate thread the way its supposed to be, then I'll test with the spindle on and if the issue persists I'll test with a disabled spindle

Yesterday something remarkable happend. For testing, I use a small G-code program with only ~10 lines. I got fedup with waiting and pressing the start button manually for each test iteration, so I made a new large G-code file that has the same 10 line program copy-pasted in it for 100 times, just repeating the 10 line program over and over. When I ran that, not a single error occured, so I thought the issue was solved (was not able to run more than 30 iterations reliably before the code modification). Then, just to be sure, I ran the small 10 line program again a few times by manually pressing start and the watchdog tripped after only a few iterations.

It might have been luck that the 100 time automatic iteration went flawless but I have a feeling it's a hint...

The only difference between running the test iterations automatically and pressing start manually each time is timing between two consecutive starts of the code. And if timing affects it, it has to be a software bug...

What do you think?
 
  @@attachment@@
Group: DynoMotion Message: 15014 From: Tom Kerekes Date: 9/8/2017
Subject: Re: Incidental watchdog triggers [1 Attachment]

Interesting.  Does the Spindle go off and on each cycle the same way with 100X file like it would with the 1X file?

I don't see anything striking in the XML file.

Regards
TK


On 9/7/2017 10:40 PM, mike.bax@... [DynoMotion] wrote:
 

Hi Tom,


Thanks for your quick replies, really appreciate that.

The machine uses Mach Version R3.043.066, I've attached the XML.

I'll split off the part that handles the spindle into a separate thread the way its supposed to be, then I'll test with the spindle on and if the issue persists I'll test with a disabled spindle

Yesterday something remarkable happend. For testing, I use a small G-code program with only ~10 lines. I got fedup with waiting and pressing the start button manually for each test iteration, so I made a new large G-code file that has the same 10 line program copy-pasted in it for 100 times, just repeating the 10 line program over and over. When I ran that, not a single error occured, so I thought the issue was solved (was not able to run more than 30 iterations reliably before the code modification). Then, just to be sure, I ran the small 10 line program again a few times by manually pressing start and the watchdog tripped after only a few iterations.

It might have been luck that the 100 time automatic iteration went flawless but I have a feeling it's a hint...

The only difference between running the test iterations automatically and pressing start manually each time is timing between two consecutive starts of the code. And if timing affects it, it has to be a software bug...

What do you think?
 

Group: DynoMotion Message: 15015 From: tesotronics Date: 9/8/2017
Subject: Re: Incidental watchdog triggers [1 Attachment]
Yeah, the spindle is started and stopped each cycle.
Group: DynoMotion Message: 15041 From: tesotronics Date: 9/24/2017
Subject: Re: Incidental watchdog triggers
Hi Tom,

The issue has been resolved.

I got fed up with debugging someone else's crappy and buggy code, especially since the most serious bug (incidental watchdog triggers) was very illusive. Instead, I ditched all code any completely started over, which also gave me the chance to get rid of Mach3, since I prefer KMotionCNC much more.

Now, everything runs on KMotionCNC and it runs super reliable and smooth! So although I cannot pinpoint were the bug was, I got rid of it.

Thanks for your help, it definitely pointed me into the right direction.
Group: DynoMotion Message: 15042 From: Tom Kerekes Date: 9/25/2017
Subject: Re: Incidental watchdog triggers
Sorry you had to go through such an ordeal.

Thanks for posting the update.

Good luck!
TK

On 9/24/2017 1:52 PM, mike.bax@... [DynoMotion] wrote:
 
Hi Tom,

The issue has been resolved.

I got fed up with debugging someone else's crappy and buggy code, especially since the most serious bug (incidental watchdog triggers) was very illusive. Instead, I ditched all code any completely started over, which also gave me the chance to get rid of Mach3, since I prefer KMotionCNC much more.

Now, everything runs on KMotionCNC and it runs super reliable and smooth! So although I cannot pinpoint were the bug was, I got rid of it.

Thanks for your help, it definitely pointed me into the right direction.